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Abstract 


A  gray-level  image  processing  system  lias  been  constructed  to  provide  capability  Tor  inspection,  object 
orientation,  object  classification,  and  interactive  control  tasks  in  an  inexpensive,  stand-alone  system  with 
moderate  processing  speed.  The  POPI-YI-.  system  ofTers  a  range  of  functions  including  algorithms  for 
preprocessing,  feature  extraction,  image  modeling,  focusing,  automatic  pan,  tilt,  and  /.oom.  interactive  com¬ 
munication  with  other  devices,  and  convenient  user  interaction.  The  host  processor  is  a  Motorola  68000 
processor  with  Multibus  communication  between  principal  modules,  an  image  data  bus  for  acquisition  and 
storage  and  a  pipeline  bus  for  image  preprocessing  and  programmable  transform  operations.  The  software 
structure  provides  hierarchical  control  over  multiple  i/o  devices,  file  management  of  system  storage,  an  image 
management  package  and  a  vector  package.  Performance  of  the  system  is  evaluated  using  convolution  filters, 
adaptive  modeling,  histogram  modification,  and  connectivity  analysis.  Cellular  logic  operations,  piecewise 
gradient  segmentation,  automatic  focusing,  and  adaptive  spatial  filtering  examples  are  described  in  detail.  The 
system  is  being  applied  to  a  number  of  practical  industrial  applications. 


1.  Introduction 


Image  processing  and  computer  vision  systems  offer  tremendous  potential  in  the  development  of  in¬ 
tegrated  systems  which  sense  and  adapt  to  external  events.  Visual  feedback  permits  such  robotic  systems  to 
evaluate,  plan  and  execute  courses  of  action  based  on  sensory  perceptions.  In  practice,  such  capabilities  allow 
a  robotic  system  to  inspect  and  evaluate  work  in  progress,  to  acquire  and  orient  objects  under  visual  control, 

and  to  plan  manipulation  or  navigation  in  complex  environments.1-2-3,45-6-7,8,9, 10, 11  and 
12.  13. 14.  IS.  16. 17. 18. 19.  20.  21.  22. 23 


The  application  of  computer-based  vision  systems  and  their  integration  into  complex  systems  has  been 
limited  by  a  number  of  Victors  inherent  in  current  systems: 

•  SPlT.n.  Most  iin  piemen  rations  require  inspection  speeds  of  about  1-10  seconds  for  manufacturing 
tasks  and  less  than  1  second  for  robot  control  tasks. 

•  l-UNCTlON.  While  existing  systems  do  recognition  of  gross  silhouette  shape  in  binary  systems  or 
image  transformation  and  preprocessing  in  gray-level  systems,  no  commercial  systems  do  general 
forms  of  gray-level  object  recognition  or  inspection. 

•  H  I-XIBILITY.  Tim  nature  of  industrial  inspection  tasks  varies  widely  and  systems  must  be  in¬ 
herently  adaptable  to  many  different  tasks  in  order  to  be  cost-effective. 

•  ROBUSTNESS.  The  system  should  offer  robust  performance  under  changing  lighting  or  other  en¬ 
vironmental  conditions.  Binary  vision  systems  are  particularly  sensitive  to  such  factors. 

•  USER  INTERACTION.  The  system  should  provide  user  interactive  modes  of  operation  to  be  useful  as 
both  an  experimental  tool  for  the  development  of  applications  as  well  as  an  on-line  monitor  of 
inspection  results. 

»  SYS'I  i:m  interaction.  Integration  of  a  vision  system  into  a  more  complex  environment  depends 
strongly  on  the  ability  to  interface  and  communicate.  'ITic  lack  of  effective  communications  links 
in  many  current  systems  impairs  the  speed  and  flexibility  of  resulting  integrated  systems. 

•  cost.  The  cost  of  both  development  and  production-line  systems  affects  the  feasibility  of  adop¬ 
tion.  Current  vision  systems  arc  major  investments  as  components  in  a  robotic  system  and  have 
discouraged  many  prototype  industrial  applications. 

The  development  of  gray-level  vision  system  algorithms,  hardware,  and  software  is  still  a  difficult 
research  task.24, 251 26-  27, 281 29  Algorithms  for  such  scene  interpretation  and  object  identification  exist  only  for 
highly  structured  environments  and  have  most  often  been  developed  on  large,  general-purpose  computing 
machines.  Imaging  data  is  inherently  complex  due  to  the  ambiguity  which  occurs  between  an  observed 


two-dimensional  image  and  a  given  three-dimensional  scene30  3I'  32, 33.  The  observed  image  depends  not  only 
on  die  geometry  of  the  scene  but  also  on  light  source  geometry,  surface  orientation,  surface  reflectivity,  and 
spectral  distribution.  Practical  experiments  on  object  description  from  imaging  data  require  two  or  three 
cameras  and  significant  assumptions  about  the  scene  characteristics.30,3310111214  At  CMU  we  have 
designed  and  constructed  a  gray-level  processing  system  which  will  serve  as  an  experimental  tool  in  the 
development  of  algorithms,  modular  hardware  elements,  and  interactive  software.  The  principal  goals  of  the 
system  are  to  provide  inexpensive  gray-level  capability  for  inspection,  object  orientation,  object  classification, 
and  interactive  control  tasks  in  a  stand-alone  system  with  moderate  processing  speed.  Inherent  in  these  goals 
were  decisions  not  to  build  special  purpose  hardware  for  the  basic  system  structure,  but  to  build  functional 
hardware  units  utilizing  commercially  available  components  wherever  possible.  The  software  structure 
should  provide  for  a  complete  range  of  system  functions  including  digitization,  frame  storage,  preprocessing, 
feature  extraction,  segmentation,  image  modeling,  classification,  automatic  focus,  pan,  tilt  and  zoom,  display, 
storage,  communications  with  other  automated  devices  and  convenient  user  interaction.  In  addition,  the 
software  structure  should  be  largely  independent  of  particular  modular  hardware  components  so  that 
hardware  enhancements  may  be  added  without  major  restructuring  of  the  software. 

Ihc  general  characteristics  of  the  resulting  system  arc  described  in  this  paper.  The  system  currently  is  in 
routine  use  for  algorithm  development  with  particular  attention  to  model-based  approaches  to  object  orien¬ 
tation  and  classification.  The  system  communicates  w'th  the  Flexible  Assembly  Station34,  an  experimental 
system  for  investigating  research  issues  in  sensor-based  assembly,  and  is  used  for  interactive  control  of  robots 
as  well  as  on-line  inspection  of  assembly  components.  The  gray-level  vision  system  has  been  applied  to  a 
number  of  specific  industrial  problems  under  funding  from  industrial  sponsors  and  affiliates  of  The  Robotics 
Institute. 

This  paper  provides  an  overview  of  the  hardware  and  software  organization  of  POPi-YK.  die  CMU  gray- 
levcl  vision  system.  It  includes  a  quantitative  evaluation  of  the  basic  system  with  some  discussion  of  projected 
enhancements  by  new  board  designs.  Applications  of  the  system  in  the  performance  of  cellular  logic  opera¬ 
tions35,  36,  piecewise  gradient  segmentation10,  automatic  focusing  and  adaptive  spatial  filtering  arc  also 
presented. 

2.  Hardware 

There  arc  a  number  of  alternatives  to  consider  in  the  design  of  a  vision  system.  Some  of  the  early  work 
was  geared  to  the  use  of  general  purpose  computers  coupled  to  a  frame-buffer  display  system.  Although  this 
type  of  system  offers  advantages  such  as  mass  storage  capabilities,  extensive  software  libraries  and  good 
operating  systems  that  hide  the  hardware  from  the  user,  it  tends  to  be  too  slow  for  on-line  applications  such  as 
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industrial  inspection  tasks.  The  main  characteristic  of  such  systems  is  that  die  operations  must  be  performed 
serially  in  a  single  processing  unit 

The  extreme  alternative  is  to  dedicate  a  processing  unit  for  each  picture  element26.  Designs  of  this  type 
have  proven  to  be  extremely  fast  but  difficult  to  program,  so  they  have  found  places  only  in  laboratories  or 
special  applications.  Other  alternatives  include  pipelined  and  parallel  multiprocessor  architectures  such  as 
crossbar  switches  and  time-shared  busses.37  Ihc  POPKYli  vision  system  is  a  loosely  coupled  multiprocessor 
system  under  die  MULTIBUS  convention.  Figure  2-1  shows  the  block  diagram  of  the  main  subsystems  and 
figure  2-2  shows  a  photograph  of  the  current  popi-yi;  vision  system. 


Figure  2-2:  Photograph  of  the  POPS-YK  vision  system 


2.1 .  Main  Processing  Unit  (MPU) 

An  in-house  design  based  on  Motorola's  MC68000  16/32  bit  microprocessor,  die  MPU  functions  as  the 
flow  controller  of  the  entire  system.  It  features  a  10MH/.  CPU.  8  K 13  of  HPP.OM  for  the  monitor  (see  the 
Software  section)  and  4  KB  of  RAM  for  the  stack.  It  also  features  two  serial  lines  and  five  countcr/timcrs. 
I  "he  serial  lines  arc  typically  used  to  communicate  with  the  user's  terminal  and  the  host  computer,  a  DBC 
VAX  11/750  running  UNIX.  I'hrcc  of  the  timers  arc  used  by  the  system  as  a  real  time  clock  and  the  other  two 
arc  available  to  the  user. 

I'hc  MPU's  functions  include  downloading  code  from  the  host  computer  to  the  other  processors,  inter¬ 
action  with  the  user,  real-time  events  and  orchestrating  the  flow  of  information  within  the  system. 

2.2.  Main  Memory  (MM) 

Two  memory  boards,  providing  a  total  of  640  KB.  comprise  the  system's  main  memory.  The  memory  is 
divided  into  128  KB  (Central  Data  Corporation's  CDC-I28K)  used  for  programs  and  system  utilities,  and  0.5 
MB  (Chrislin  Industries’  C 1-5 1 2)  used  for  data.  Space  on  the  latter  board  is  obtained  from  system  calls  to  a 
dynamic  allocation  package. 


2.3.  Secondary  Storage  (SS) 

A  10  MB  Winchester  drive  (Shugart  Associates'  S A- 1004)  and  a  1  MB  floppy-disk  drive  (SA-ROO)  give 
the  system  11MB  of  on-line  secondary  storage.  The  disk  controller,  manufactured  by  Data  Technology  Corp. 
(type  DTC-H03D.)  may  be  connected  to  up  to  four  drives.  The  data  transfer  is  done  via  direct  memory  access 
(DMA)  between  die  MM  and  die  disk  controller's  MULTIBUS  adapter  { DTC-S6 ).  The  adapter  controls  the 
transfer.  Other  features  include  copying  data  between  die  drives  without  going  dirough  main  memory. 

2.4.  Input/Output  Control  (IOC) 

The  rest  of  the  Input/Output  (other  dian  communicating  with  the  user's  terminal  or  the  host  computer) 
is  handled  by  a  board  made  by  Monolithic  Systems  Corp.  This  /.80-bascd  I/O  controller  (MSC-R007)  has  32 
KB  of  dual-ported  RAM  which  it  uses  to  communicate  with  die  MPU.  The  board's  collection  of  I/O  devices 
includes  three  serial  lines  (normally  connected  to  a  printer,  a  bit-pad  and  a  general  purpose  serial  link)  and 
two  parallel  ports  which  arc  typically  used  to  communicate  with  die  Image  Positioning  subsystem  described 
below. 

The  IOC  has  a  floating-point  processor,  capable  of  10000-40000  flops,  which  is  used  mainly  by  the 
on-hoard  /.80.  32  KB  of  KPROM  will  contain  die  I/O  drivers  and  some  low-level  algoridims  for  the  Image 
Positioning  subsystem. 

2.5.  Image  Acquisition  and  Display  (IAD) 

Lour  boards,  all  manufactured  by  Matrox  Hlectronic  Systems.  Ltd.,  provide  the  capability  of  digitizing 
and  displaying  images  in  real  time  (60  Helds  per  second).  1'hc  frame  grabber  (an  FG-01)  digitizes  a  256  x  256 
pixel  image  directly  from  the  TV  camera  with  up  to  256  levels  of  gray  (8-bit  quantization)  in  1/60  of  a  second. 
It  accepts  its  input  from  one  of  four  cameras  under  software  selection. 

1'hc  8-bit  picture  elements  (pixels)  arc  transferred  via  a  fast  bus,  hereafter  called  the  Matrox  Bus,  to  die 
frame  buffer  (two  RGR-256  boards)  which  continuously  displays  its  contents  on  a  TV  monitor.  Kach  board 
holds  four  bits  of  the  eight  bit  resolution.  The  frame  buffer  has  both  composite  video  and  RGB  outputs  and 
dv.is  it  may  be  used  to  display  color  or  black-and-white  images.  The  color  map  is  fixed  by  the  hardware, 
which  provides  three  bits  for  red,  three  for  green  and  two  for  blue. 

The  last  board  of  the  IAD  is  a  one-bit  overlay  plane  (MSBC-512)  used  to  nondestructive^  display 
cursors,  viewport  boundaries  and  other  temporary  objects.  When  an  overlay  pixel  is  set  to  1  the  correspond¬ 
ing  area  of  the  screen  is  at  full  brightness,  regardless  of  the  pixel’s  frame-buffer  value. 


2.6.  Image  Positioning  (IP) 

In  order  to  add  flexibility  to  the  iAl)  subsystem,  the  TV  camera  was  mounted  on  a  pan/tilt  head  ( Vicon 
VJOOPT)  and  fitted  with  a  remote  zoom/focus  lens  (Vicon  VI 2.5-75).  ITicse  two  elements  constitute  the 
image  positioning  subsystem  used  in  object  tracking  and  automatic  focusing  algorithms.  A  small  hardware 
interface  connects  the  parallel  port  of  the  IOC  to  the  standard  controller  (17 2V-SPP)  provided  by  the 
manufacturers  of  the  head  and  lens.  Iliis  provides  die  user  with  control  over  the  pan  and  tilt  parameters  of 
the  head  and  die  zoom  and  focus  parameters  of  the  lens.  ITic  pan/tilt  head  is  large  enough  to  hold  two 
cameras  for  stereo  vision  applications. 

2.7.  Array  Processor  (AP) 

An  array  processor  was  added  to  pophyk  for  number  crunching  applications.  The  two-board  set 
manufactured  by  Sky  Computers.  Inc  (Sk'YMNK-Af)  is  capable  of  up  to  1  Mfiops  and  it  is  utilized  by  the 
system  to  perform  vector  calculations  and  Fourier  analysis  on  raw  data. 

The  AP  has  a  radicr  sophisticated  DMA  controller  to  retrieve  the  data  and  store  die  results  in  main 
memory.  It  is  possible  to  specify  not  only  die  number  of  consecutive  words  (n)  but  a  number  of  words  (m)  to 
be  skipped  before  retries  ing  the  next  n  words.  The  user  may  also  specify  the  number  of  (n  +  m)  combinations 
to  be  used  in  a  single  command.  This  complex  addressing  scheme  is  especially  useful  for  image  processing 
tasks. 

2.8.  Image  Pre-processing  Units  (IPUs) 

When  implementing  image  pre-processing  algoridims,  one  often  has  to  deal  with,  very  large  amounts  of 
data  and.  while  die  operations  tend  to  be  simple  and  repetitive,  it  is  necessary  to  perform  diem  very  quickly  to 
achieve  the  required  overall  performance. 

We  arc  constructing  two  Image  Pre-processing  Units  (IPUs),  consisting  of  an  MC68000  processor,  an 
image  page  and  a  pipeline  page  (see  Figure  2-3).  Kach  of  the  two  pages  is  64  KB  long  so  they  can  accom¬ 
modate  a  256  x  256  x  8  bit  image.  The  image  page  may  be  loaded  from,  or  dumped  to,  die  Matrox  Bus  in 
l/60th  of  a  second.  It  is  normally  used  to  hold  die  input  data  to  be  processed  by  the  MC68000  processor  or 
the  results  of  the  pre-processing  algorithm. 

The  on-board  12  MHz  MC68000  has  32  KB  of  RAM  from  which  it  executes  instructions.  This  memory 
and  the  image  page  arc  mapped  into  the  MULTIBUS  memory  space  so  they  may  be  loaded  or  read  by  the 
MPU.  In  a  normal  application,  the  MPU  first  downloads  code  from  die  host  computer  into  this  program 
memory  so  the  IPU  can  execute  it. 
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Figure  2-3:  Block  Diagram  of  the  Image  Pre-processing  Units 

The  other  64  KB  of  memory,  the  pipeline  page,  is  only  accessible  to  the  on-board  MC68000  and  a  fast 
bus  eal'cd  the  Pipeline  Pus.  It  is  thus  possible  to  connect  the  two  IPUs  back  to  back  by  means  of  the  pipeline 
bus.  In  such  a  configuration,  one  1PU  would  receive  die  raw  image  in  its  image  page  and  perform  a 
pre-processing  algorithm  storing  the  results  in  its  pipeline  page.  The  other  IPU  would  take  the  data  from  that 
pipeline  page  and  perform  a  second  pre-processing  algorithm  putting  the  results  in  its  image  page  from  which 
they  may  be  displayed  in  1/60  of  a  second  Ihis  is  possible  because  each  MC68000  has  access  to  the  pipeline 
bus,  and  tlius  to  the  other  IPUs  pipeline  page. 

2.9.  Programmable  Transform  Processor  (PTP) 

A  number  of  vision  algorithms  require  that  an  image  be  transformed  either  logically  or  mathematically. 
Most  of  these  transforms  arc  relatively  straightforward,  applying  a  number  of  simple  operations  to  a  neigh¬ 
borhood  around  the  pixel  being  analyzed. 

The  PTP  is  a  microprogrammablc  processor  specifically  designed  to  implement  either  logical  or  math¬ 
ematical  transforms  over  a  programmable  neighborhood.  A  block  diagram  of  the  PTP  is  shown  in  Figure  2-4. 
It  is  capable  of  convolving  a  3  x  3  mask  with  the  full  image  in  less  than  240  msecs  or  running  a  cellular-logic 
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PIPELINE  BUS 


Figure  2-4:  Block  Diagram  of  the  Programmable  Transform  Processor 


cycle  in  little  over  100  msecs.  The  design  includes  a  3  x  4  pixel  pipeline,  an  8  x  8  Hash  multiplier,  an  8-bit 
ALU  and  a  powerful  neighbor-address  generator  which  may  calculate  up  to  16  neighbor-pixels'  addresses  in 
parallel  to  the  main  computations.  The  control  store  holds  IK  64  bit  jiwords  and  is  mapped  onto  the 
MULTIBUS  and  it  is  loaded  by  the  MPU  during  an  initialization  phase.  It  is  implemented  with  very  high 
speed  RAM  permitting  typical  microcyclc  times  of  less  than  200  nsecs. 
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2.10.  Future  enhancements 

In  the  future  wo  will  add  a  10  Mb  Hthernct  controller  to  speed  up  the  communication  link  to  die  host  as 
well  as  to  give  the  system  access  to  a  number  of  resources  available  at  Curnegie-Mellon  University.  Within  die 
Robotics  Institute  we  will  have  a  Three  Rivers  Computers  Corp.  I' I  RQ  and  several  special  processors  linked 
via  die  10  Mb  Hthernct.  Also,  a  gateway  to  die  3  Mb  Kthcrnct  is  planned  which  would  link  us  to  more  dian  a 
dozen  VAXcn  and  other  resources,  including  a  60  page  per  minute  laser  printer. 

Kor  color  vision,  we  have  acquired  a  filter  wheel  which  will  enable  us  to  obtain  three  component  color 
images  corresponding  to  die  three  primary  hues.  A  controlled-ligliting  environment  is  planned  to  perform 
critical  experiments. 

3.  Software 

The  software  for  the  hopi-yf  vision  system  can  be  divided  into  four  levels:  host  level  support,  device 
level  support,  object  level  support  and  applications  programming.  (Refer  to  Figure  3-1.)  l-'ach  level  consists 
of  several  programs  and  subroutine  libraries.  The  total  software  effort  has  grown  to  approximately  400  pages 
of  code,  written  mostly  in  C,  all  of  which  was  written,  edited  and  compiled  on  the  host  computer.  This 
machine  senes  as  a  support  facility  for  several  projects  of  this  type,  running  C  cross-compilers  for  four 
different  machines.  In  addition,  it  is  'inked  to  CMU's  Hthernct.  allowing  it  to  keep  abreast  of  system  software 
updates,  bulletin  board  information  and  electronic  mail  traffic. 

Much  of  the  software  for  POPlYF.  was  consciously  patterned  after  similar  components  in  UNIX,  in  several 
cases,  we  were  able  (or  forced)  to  port  source  code  from  the  VAX  to  the  POPLYF.  system. 

Software  engineering  practices  arc  strongly  adhered  to  throughout  the  vision  system  software,  including 
manual  entries  for  each  program  and  subroutine,  a  header  page  for  each  module  of  source  code  and  verbose 
and  plentiful  comments. 

3. 1 .  Host  Level  Support 

3.1.1.  Editing  and  Compiling 

All  the  programs  that  run  on  POPF.YH’s  main  pnxessing  unit  arc  written,  edited  and  compiled  on  the  host 
machine.  Almost  all  llie  code  is  written  in  C.  with  only  small  utilities  where  efficiency  is  a  major  consideration 
being  written  in  M680Q0  assembly  language. 

The  C  cross  compiler  package  for  the  68000  is  very  similar  to  the  native  C  compiler  for  the  VAX  in  that  it 
consists  of  a  translator,  post  optimizer,  assembler,  linking  loader,  and  symbol  table  maintainor.  The  loader 
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Figure  3-1:  Software  Configuration  of  CMU's  POPr.Yl.  vision  system 

uses  the  same  subroutine  library  format  as  the  LMX  loader,  which  allows  us  to  use  the  same  archiver.  The 
cross  compiler  loader  also  allows  external  symbol  references  to  be  resolved  by  searching  the  symbol  table  files 
from  other  programs,  something  which  is  very  useful  in  generating  programs  for  a  single  process,  single  user 
environment.  Often,  a  program  which  tests  the  algorithm  of  the  day  may  be  changed,  recompiled, 
downloaded  and  executed  every  few  minutes,  so  it  helps  to  divide  the  program  into  two  segments.  A  small 
piece  which  contains  only  the  algorithm  implementation  can  be  quickly  recompiled  and  downloaded,  while  a 
second,  larger  piece  containing  support  utilities  such  as  image  display  subroutines  can  sit  in  main  memory 


unchanged.  This  is  a  great  boon,  as  downloading  code  even  at  %00  baud  is  painfully  slow. 


3.1.2.  Debugging 

Another  important  piece  of  host  level  support  is  die  symbolic  debugger.  Building  a  debugger  for  our 
environment  proved  to  be  a  much  more  complicated  task  than  building  a  standard  debugger,  since  die  host 
machine  must  communicate  with  the  MPU  in  the  vision  system,  polling  memory  locations,  stopping  and 
restarting  execution,  single  stepping  either  through  assembly  language  instructions  or  through  lines  of  source 
code  and  setting  and  deleting  break  points.  'Ilius.  the  debugger  is  actually  a  distributed  software  system,  or  a 
"cross-debugger”  (see  Figure  3-2). 
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Figure  >2:  Representation  of  the  Cross-l  X'bugger  System 


At  present,  when  a  program  dies  unexpectedly,  the  monitor  prints  a  cryptic  diagnostic  on  the  user 
terminal  which  shows  the  contents  of  the  program  counter,  status  register  and  possibly  some  other  infor¬ 
mation.  Given  dtc  address  where  the  program  died,  the  debugger  will  search  the  symbol  table  file  for  that 
program,  figure  out  which  subroutine  contains  the  address  and  disassemble  the  subroutine.  I. ike  its  UNIX 
counterpart,  the  debugger  can  manipulate  several  programs  with  their  associated  symbol  tables  and  ex¬ 
ecutable  segments. 

The  compiler  also  supports  the  debugging  effort  by  placing  labels  in  the  assembly  language  output  that 
correspond  to  the  beginning  of  each  line  of  source  code.  This  allows  the  debugger  to  execute  the  program  on 
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a  line  by  line  basis. 

Although  incompletely  implemented  at  present,  future  plans  include  extension  of  die  debugger  to  its  full 
interactive  capability. 

3.1 .3.  Downloading  and  Uploading 

At  die  end  of  the  compilation  process,  an  extra  phase  of  die  C  cross-compiler  produces  an  ASCII  version 
of  the  executable  program  in  Motorola  VKRSABUG  format.  At  the  request  of  die  MPU.  die  host  machine 
dumps  this  file  over  the  serial  line  connecting  the  two  processors.  The  VI  I'Ll  executes  a  subroutine  which 
reads  die  file,  decodes  the  VKRSABUG  records  and  loads  the  executable  code  into  main  memory.  This  again 
is  a  distributed  software  system,  though  not  nearly  as  complicated  as  the  debugger. 

In  addition  to  trading  in  VF.RSABUG  format,  the  host  machine  also  implements  a  generalized 
upload/download  protocol  designed  to  support  the  debugger  communications  and  die  transfer  of  image  data. 
The  black  and  white  camera  attached  to  iwryh  can  be  used  with  color  filters  to  obtain-  component  color 
images,  which  can  then  be  uploaded  to  the  host  machine,  recombined  and  displayed  on  die  Grinned  color 
frame  buffer  system. 

3.1 .4.  Language  Development 

Many  of  the  applications  programs  for  POIMYK  arc  simple  enough  to  need  only  a  single  character  menu 
driven  input  paradigm.  In  certain  eases,  however,  the  input  is  structured  enough  to  warrant  a  parser  and/or  a 
lexical  analyzer.  The  host  UNIX  system  has  tools  for  building  just  such  items,  and  which  output  code  in 
C.  With  only  minor  modifications  relating  to  i/o,  this  code  can  be  cross-compiled  and  executed  on  POPRYR’s 
MPU.  An  example  of  a  program  which  uses  both  the  parser  generator  and  lexical  analyzer  generator  will  be 
described  later. 

3.1.5.  Hardcopy 

Often,  hardcopy  of  some  entity  such  as  an  image,  a  line  scan  or  a  histogram  plot  is  desired.  The  high 
resolution  laser  printer  connected  to  CMU’s  Kthemet  is  used  for  this  purpose.  The  information  is  uploaded 
to  die  host  in  one  of  several  data  formats,  converted  by  some  program  or  sequence  of  programs  into  a 
printable  file  and  finally  shipped  o*cr  the  Kthcrnct  to  the  printer.  The  printable  files  can  also  be  included  as 
illustrations  in  documents.  Because  of  printer  limitations,  images  must  be  binarized  before  being  printed. 


3.2.  Device  Level  Support 


3.2.1.  The  Monitor 

At  the  heart  of  the  device  level  software  lies  the  monitor.  This  program  is  stored  in  l-PROVI  in  die  MPU 
and  is  executed  on  power-up  and  on  receipt  of  fata!  exceptions  such  as  bus  errors.  ITic  monitor  provides 
enough  capability  to  download  and  execute  programs  through  the  implementation  of  the  following  features. 

•  TALK-THRU  MODI .  ITic  monitor  can  make  a  software  connection  between  the  two  serial  lines  on 
the  MPU  board  to  allow  the  user  access  to  the  host  as  if  there  were  no  vision  system  between  the 
two.  ITiis  is  the  mode  of  operation  during  logins,  editing  and  compiling.  After  editing  and 
recompiling  a  program,  the  user  can  exit  talk-thru  mode  and  return  to  popi-yi-:. 

•  downloading.  When  the  user  wishes  to  download  and  execute  a  program,  he  gives  the  name  of 
the  program  to  the  monitor.  Ilic  monitor  requests  the  program  from  the  host  and  enters 
download  mode.  During  the  downloading  process,  the  monitor  takes  apart  the  VKRSAliUG 
format  file  produced  by  the  cross-compiler  and  sets  the  executable  code  into  main  memory.  If 
desired,  the  monitor  will  automatically  execute  die  program  at  the  end  of  the  file  transfer.  If  the 
execution  of  the  program  is  successful,  the  monitor  regains  control  in  normal  mode  after  termina¬ 
tion.  If  not.  the  monitor  regains  control  through  an  exception  handler,  urps  a  message  to  the  user 
terminal  and  again  returns  to  normal  mode. 

•  DIBLGGINC.  For  simple  hand  debugging  jobs,  the  monitor  allows  the  user  to  examine  and 
change  the  contents  of  memory  on  an  8.  16  or  32  bit  word  basis.  In  the  future,  the  monitor  will 
also  support  die  lowest  level  of  the  cross-debugger  communications  protocol.  This  is  a  par¬ 
ticularly  difficult  problem  since  communications  between  the  user  terminal  and  die  host  must  be 
maintained  while  silendy  allowing  the  debugging  program  on  the  host  to  access  the  contents  of 
main  memory.  (Refer  back  to  Figure  3-2.) 

•  DYNAMIC  Ml-MORY  allocation.  To  make  die  applications  programs  smaller,  cleaner  and  easier 
to  write,  a  dynamic  memory  allocation  package  was  installed  in  the  monitor.  Hie  package  is 
initialized  before  the  execution  of  each  program  and  provides  whatever  space  the  program  may 
request  for  temporary  storage.  For  example,  the  image  manipulation  package,  to  be  described 
shortly,  uses  the  allocator  to  obtain  space  for  storing  image  data  in  main  memory. 

To  maintain  independence  of  hardware  configuration,  die  monitor  knows  nothing  about  any  hardware 
outside  of  the  MPU. 
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3.2.2.  Device  Drivers 

1110  remainder  of  die  device  level  support  layer  is  a  collection  of  device  drivers  for  die  various  hardware 
subsystems  described  in  section  2.  Hie  drivers  arc  stored  on  die  disk  as  files  and  read  into  main  memory 
when  a  particular  device  is  opened. 

•  ITic  serial  i/o  package  communicates  with  die  terminal,  host,  printer,  bitpad  and  general  purpose 
serial  line.  Serial  i/o  is  interrupt  driven. 

•  'Die  parallel  i/o  package  communicates  with  a  special  purpose  hardware  interface  to  provide  the 
MPU  with  control  over  the  pan/tilt  head  and  the  motorized  zoom  lens.  Thus,  a  user  program  can 
independently  control  the  pan  and  tilt  angles  and  die  zoom  and  focus  parameters  of  die  lens.  A 
tracking  program  which  exercises  this  control  will  be  described  insertion  5.  Parallel  i/o  may  be 
interrupt  driven  or  polled. 

•  The  disk  i/o  package  handles  the  lowest  level  of  data  transfers  to  and  from  the  disks  and  consists 
of  a  primitive  space  manager  and  die  interface  to  the  DMA  controller.  A  copy  command  is 
available  to  make  disk  backups  simple. 

•  'Die  frame  i/o  package  talks  to  the  image  acquisition  and  display  subsystem,  controlling  the 
transfer  of  data  to  and  from  die  frame  buffer  and  main  memory  and  the  grabbing  of  frames  from 
up  to  four  television  cameras. 

•  The  array  processor  i/o  package  merely  sets  up  DMA  commands  for  the  hardware. 

3.3.  Object  Level  Support 

'Die  object  level  support  layer  consists  of  the  vector  manipulation  package,  the  file  handling  system  and 
the  image  manipulation  package.  Together,  these  three  pieces  provide  user  programs  with  an  elegant  inter¬ 
face  to  the  hardware  capabilities  of  CM  U’s  POPl-YE  vision  system. 

3.3.1 .  The  Vector  Manipulation  Package 

The  vector  manipulation  package  is  the  simplest  of  the  three  pieces  and  provides  access  to  the  capabilities 
of  the  array  processor  subsystem  without  die  headaches  of  talking  directly  to  the  hardware.  The  hardware  is 
manipulated  at  the  lowest  level  by  vendor  supplied  microcode  which  resides  on  the  MU1.TIIIUS  boards. 
Above  the  microcode  lies  the  device  driver,  and  above  the  driver  lies  a  layer  of  assembly  language  sub¬ 
routines.  supplied  in  part  by  the  vendor  as  a  library.  These  routines  implement  functions  such  as  data  format 
conversion,  vector  algebra  routines  and  the  FFT  algorithm. 


3.3.2.  The  File  Handling  System 

POPI  YI  'S  file  structure  is  one  of  the  ports  that  was  consciously  modeled  alter  Unix.  In  the  spirit  of  UNIX, 
it  unifies  the  myriad  of  details  relating  to  both  disk  storage  and  program  i/o  into  a  single  framework,  'litis 
allows  the  devices  attached  to  die  system  to  be  regarded  as  files.  Input  to  a  ainning  program  (a  process) 
always  comes  from  a  file,  but  often  the  "file"  actually  points  through  to  the  user  terminal.  Pulling  die  next 
character  from  the  input  causes  the  serial  line  device  driver  to  get  a  character  from  the  terminal.  I  .ikewisc.  the 
output  from  a  process  always  goes  to  a  file,  but  again,  die  file  could  actually  be  the  terminal. 

Our  primary  motivation  for  attaching  a  disk  controller  to  the  vision  system  was  the  need  to  store  images. 
In  addition,  once  the  size  of  our  applications  programs  grew  to  the  point  that  downloading  became  uncom¬ 
fortable.  the  natural  diing  to  do  was  to  store  the  programs  on  disk.  Our  first  inclination  was  to  buy  a 
UNIX-like  operating  system  for  the  68000  and  be  done  with  worrying  about  files.  Unfortunately,  often  an 
operating  system  slows  down  the  raw  speed  of  a  computer  system,  thus  diminishing  its  performance.  It  was 
decided  that  popi-yi-  would  be  a  single  user,  single  task,  machine.  In  addition,  after  researching  the  details  of 
file  storage  on  UNIX,  we  decided  that  certain  aspects  of  the  file  system  were  unattractive.  We  had  grown 
accustomed  to  the  fast  image  access  that  comes  from  contiguous  file  storage.  In  UNIX,  files  can  be  fragmented 
and  strewn  about  all  over  the  disk.  In  a  inultiproccss.  multiuser  environment  where  garbage  compacting  is 
impractical,  this  storage  scheme  makes  sense.  In  our  environment,  however,  speed  of  access  is  more  highly 
valued.  What  wc  ended  up  with  is  a  file  system  with  the  convenient  tree  structure  of  UNIX,  along  with  the 
option  of  specifying  files  to  be  contiguous. 

3.3.3.  The  Image  Manipulation  Package 

'Ihc  last  and  largest  piece  of  the  object  level  support  layer  is  the  image  manipulation  package,  a  sub¬ 
routine  library  which  provides  primitives  for  the  manipulation  of  images  on  disk,  in  main  memory  and  on  the 
screen.  Ihc  following  conventions  have  been  established.  (Refer  to  Figure  3-3.) 

A  collection  of  pixels  on  disk  is  called  an  image.  To  the  file  handling  system,  an  image  is  just  another  file, 
save  dial  it  is  stored  contiguously.  The  contents  of  the  file  can  be  created  by  any  means:  grabbing  frames 
from  die  camera,  processing  another  image  and  random  number  generation  arc  all  valid  means  of  image 
creation.  Presently,  images  arc  constrained  to  be  a  multiple  of  16k  bytes  in  length.  This  means  that  a  256  x 
256  pixel  image  typical  of  popf.yh  is  of  length  4,  while  a  512  x  512  pixel  image  typical  of  the  Grinncll  system 
attached  to  the  Vax  is  of  length  16. 

To  process  an  image,  the  pixels  must  be  moved  from  disk  to  main  memory,  where  dicy  reside  in  a 
window.  Windows  can  be  of  arbitrary  size  and  shape.  The  pixels  arc  again  stored  contiguously.  To  aid  in 
processing  a  window,  there  exists  another  object,  a  rectangular  subset  of  the  pixels  in  a  window  called  a  pane. 
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Figure  3-3:  Representation  of  the  Image  Manipulation  System 

Once  the  pixels  is  a  window  have  been  moved  to  main  memory,  the  Fane  can  be  moved  about  within  the 
window,  thus  eliminating  the  need  to  reread  the  pixels  from  the  disk  or  from  the  frame  buffer  each  time  the 
area  of  interest  changes. 

To  view  the  contents  of  a  window  on  the  monitor,  a  viewport  is  created  and  linked  to  die  window. 
Viewports  must  have  identical  dimensions  to  the  windows  to  which  they  arc  linked,  but  arc  free  to  occupy  any 
position  on  the  screen.  The  si/c  and  location  of  a  viewport  may  be  changed  interactively  by  using  die  cursor 

f 

movement  commands  of  the  terminal.  Several  viewports  may  be  linked  to  a  single  window.  Changes  made  to 
the  contents  of  a  window  will  be  reflected  in  each  viewport  to  which  it  is  linked. 

The  last  type  of  object,  the  Cursor,  is  used  for  pointing  to  specific  locations  on  the  screen. 

3.4.  Application  Programs 

The  remainder  of  die  vision  system  software  is  a  collection  of  application  programs  and  subroutines.  A 
large  piece  in  diis  category  is  a  subroutine  library  full  of  garden-variety  image  processing  algorithms  such  as 
high  pass  and  low  pass  filter  convolution  kernels,  the  Sobcl  edge  detector,  a  temporal  averaging  subroutine  to 
reduce  the  effects  of  camera  noise,  histogram  manipulation  subroutines,  a  contrast  enhancement  package, 
binari/ation  and  cellular  logic  transform  operators  and  a  temporal  differencing  subroutine.  All  of  these 
subroutines  operate  on  one  or  more  of  the  objects  described  previously. 

Above  this  rather  standard  library  is  a  collection  of  more  advanced  image  processing  algorithms  which 
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we  have  written  for  our  own  purposes. 

•  'IT>c  standard  binary  cellular  logic  idea  has  l»ccn  extended  to  operate  on  grey  scale  images,  result¬ 
ing  in  Adaptive  Cellular  Logic,  or  ACL.  This  is  useful  for  performing  a  more  intelligent  binariza- 
tion  than  can  be  obtained  by  simple  thresholding  as  well  as  for  edge  detection  and  blob  smoothing 
in  grey  scale  images. 

•  Several  data  compression  schemes  have  been  implemented  for  die  purpose  of  reducing  the 
amount  of  processing  necessary  to  perform  pattern  recognition  to  a  level  compatible  with  real 
time  control,  lliis  is  the  subject  of  section  5.2. 

•  A  small  interpretive  language  for  multipass  image  filtering  has  been  specified  and  implemented. 
This  is  described  in  section  5.4. 

•  A  large  support  program  of  the  type  described  earlier  in  conjunction  with  compiling  and 
downloading  has  been  provided  as  a  base  for  algoridim  development.  This  program  contains 
most  of  the  subroutines  described  above,  including  the  software  for  controlling  die  pan/tilt  head 
and  zoom  lens,  so  diat  test  programs  may  remain  small.  The  support  program  is  capable  of 
downloading  and  executing  test  programs  without  returning  to  die  monitor,  and  so  docs  not  have 
to  be  reinitialized  after  each  program  call. 

•  A  general  purpose  command  interpreter  package  has  been  written  to  make  the  construction  of 
menu  driven  programs  as  painless  as  possible.  Ihe  package  includes  facilities  for  recognizing  and 
executing  commands,  changing  variables  during  execution  and  on-line  help  information.  As 
mentioned  earlier,  users  intending  to  build  programs  for  general  use  —  especially  demonstration 
programs  —  arc  encouraged  to  use  diis  package.  Thus,  some  uniformity  between  pieces  of 
application  software  is  achieved.  First-time  users  have  little  or  no  trouble  running  demonstration 
programs  on  POIM.YE. 

•  A  simple  tracking  algorithm  utilizing  the  image  positioning  system  was  implemented  to  see  how 
close  the  processing  power  of  die  vision  system  could  pull  toward  real  time.  Ihe  program  grabs  a 
frame  from  die  camera  and  simultaneously  binarizes  and  computes  the  area  and  center  of  energy 
while  reading  the  pixels  from  the  frame  buffer,  i  he  area  and  center  of  energy  arc  compared  to 
their  previous  values  and  die  differences  used  to  deliver  control  signals  to  die  image  positioning 
system.  Movement  in  the  x  direction  generates  pan  signals,  movement  in  the  y  direction  generates 
tilt  signals  and  movement  in  the  z  direction  (change  in  area)  generates  zoom  signals.  While 
processing  die  full  256  x  256  pixel  frame  size,  the  sampling  period  is  just  under  one  second  and  all 
processing  is  done  in  the  MPU.  To  achieve  faster  rates,  some  of  the  computations  should  be 
transferred  to  die  IPUs  and  the  PTP. 

•  Two  automatic  focusing  algorithms  have  been  implemented.  These  will  be  described  in  section 
5.3. 
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In  many  industrial  production  environments  it  is  desirable  to  use  automated  vision  systems  for  inspection 
jobs  which  arc  considered  dangerous,  boring  or  unreliable  when  carried  out  by  humans.  A  number  of  the 
application  programs  have  come  from  the  implementation  of  industrial  inspection  algorithms  for  these  tasks. 
Typically,  a  concept  demonstration  is  carried  out  that  evaluates  speed  of  performance,  computational  com¬ 
plexity  and  cost  of  implementation,  llic  application  packages  written  for  this  system  have  served  not  only  to 
demonstrate  the  feasibility  of  specific  inspection  algorithms,  but  have  also  driven  the  software  development  of 
the  system  to  a  significant  extent.  Many  of  the  amenities  now  present  on  the  system  were  originally 
developed  for  specific  demonstrations.  Conversely,  several  of  the  image  processing  algorithms  developed  and 
implemented  for  research  purposes  have  found  their  way  into  industrial  inspection  packages. 

3.5.  Future  Plans 

The  following  pieces  of  software  arc  expected  to  be  integrated  into  CMU's  1‘OPI-YI!  vision  system  en¬ 
vironment  in  the  near  future. 

•  The  Kthemet  i/o  package  will  provide  a  device  level  interface  to  the  10  megabyte  Kthcrnct  when 
the  capability  becomes  necessary.  The  Kthcrnct  will  be  needed  for  high  speed  data  transfers 
between  POIT.YI!  and  the  Perq.  'Hie  Perq  has  a  high  resolution  bit  mapped  screen  and  a 
microprogram mablc  prticcssor,  making  it  a  desirable  complement  for  the  vision  system. 

•  'Hie  IPUs  installed  in  the  system  require  simple  device  drivers.  Hie  existing  software  for 
downloading  code  will  be  used  to  load  the  32  kB  program  space.  (Refer  back  to  section  2.8). 

•  Since  the  PI  P  is  a  microcodablc  machine,  it  requires  a  microassembler.  This  is  a  medium  sized 
development  project.  The  microassembler  should  provide  for  the  symbolic  manipulation  of 
microinstaictions  and  perform  rudimentary  error  checking  to  prevent  the  programmer  from 
damaging  the  hardware.  In  addition,  we  plan  to  define  a  microsubroutinc  format  for  use  with  an 
archiver  and  linking  loader  so  programmers  may  build  libraries  of  useful  transform  subroutines. 

•  After  all  the  hardware,  device  drivers  and  support  software  becomes  operational,  we  will  be  faced 
with  a  familiar  but  difficult  problem:  programming  a  multiprocessor  system.  Ill  is  is  a  major 
research  problem  we  do  not  expect  to  solve  the  first  time  around.  We  would  like  to  sec  support 
for  multiprocessing  in  the  form  of  an  editor,  a  compiler  and  a  debugger.  Ada  is  being  considered 
as  a  language  for  multiprogramming,  although  a  custom  extension  to  C  may  be  in  order.  Our  first 
approach,  however,  will  be  to  write  some  applications  software  and  use  it  to  evaluate  die  extent  to 
which  a  mutiproccssor  support  is  needed. 


4.  Performance 

It  is  always  difficult  to  evaluate  a  computer  system  since  every  architecture  has  its  strong  and  weak 
points.  ITic  problem  is  more  complex  if  the  system  to  be  evaluated  is  a  multiprocessor,  as  in  our  ease.  In  our 
discussion  about  the  system's  performance,  we  chose  to  evaluate  the  system  in  the  context  of  its  applications. 

The  vision  system  was  specifically  designed  to  be  used  in  image  processing  tasks  so  it  seems  useful  to 
compare  it  with  other  systems  used  in  those  tasks.  When  appropriate,  we  will  perform  comparisons  with  a 
display-type  system  consisting  of  a  frame  buffer  (like  those  manufactured  by  Grinncll  or  l)c  An /a)  and  a 
general  purpose  computer  (typically  a  single-user  PDP-11  or  a  multi-user  VAX  11).  We  will  also  try  to 
compare  popi-yi-  with  an  analysis-type  system  such  as  those  manufactured  by  Vicom  or  Quantex  that  execute 
a  number  of  pre-defined  algorithms  very  quickly. 

It  is  important  at  this  point  to  note  that  since  die  popi-yi-  vision  system  was  designed  to  be  a  tool  in  the 
development  and  testing  of  vision  algorithms,  it  was  essential  diat  it  be  programmable.  The  system  was  not 
intended  to  be  used  for  any  other  purpose,  unlike  the  Vax  host  of  the  display-type  system.  With  this  in  mind, 
we'll  look  at  four  image  processing  tasks:  convolution  filters,  adaptive  modeling,  histogram  modification  and 
connectivity. 

4.1.  Convolution  Filter 

In  this  type  of  problem,  a  3  x  3  mask  is  convolved  with  a  256  x  256  pixel  image.  This  is  a  repetitive 
operation  that  may  be  implemented  in  hardware.  Since  it  is  commonly  used,  most  analysis-type  machines 
have  such  a  hardware  device.  Therefore  dicy  arc  able  to  perform  the  convolution  in  real  time  (30  msecs). 

Assuming  that  the  image  has  been  acquired  already,  the  vision  system  is  able  to  do  the  convolution  and 
display  the  result  in  300  -  350  msecs  which  compares  favorably  with  a  display-type  machine.  Our  Grinncll- 
VAX  11/780  combination  takes  anywhere  from  2  to  5  sees  of  CPU  time,  depending  on  the  system  load. 

4.2.  Adaptive  Modeling 

In  this  task,  we  would  like  to  model  the  image  using  some  data  dependent  model.  An  example  would  be 
a  2-D  auto-regressive  (AR)  model.  The  data  dependency  of  the  algorithm  docs  not  allow  an  efficient 
hardware  implementation,  so  die  analysis-type  machines  do  not  perform  well.  It  may  be  necessary  to  piece 
together  the  algorithm  from  lower  level  routines  but  this  assembly  seldom  allows  the  user  to  efficiently  utilize 
the  pipelined  architecture  of  the  system.  The  display-type  machine  docs  not  perform  any  worse  than  in  the 
convolution  problem  since  both  tasks  must  be  programmed  in  software.  Again  the  system's  load  will  deter¬ 
mine  its  performance. 


The  POl’lY!  vision  system  offers  a  few  advantages  over  the  other  systems:  First.  due  to  its  large  main 
memory  space,  it  can  keep  the  entire  image  in  RAM.  allowing  a  floating-point  number  per  pixel  if  necessary; 
second,  die  PI  P  may  perform  the  raw  computations  on  the  image  while  an  I  I’U  determines  die  model 
parameters;  third,  the  user  still  controls  the  data  flow  di rough  the  MI’U  so  intermediate  results  may  be  made 
available  to  him. 

4.3.  Histogram  Modification 

In  this  task  a  pixel  by  pixel  (or  point)  transformation  is  done  on  the  image.  Unless  the  transformation  is 
fixed  and  doesn't  depend  on  die  raw  data,  two  phases  arc  necessary:  calculation  of  the  histogram  and  pixel 
modification. 

An  analysis-type  machine  could  implement  the  two  phases  in  a  pipeline  of  processes,  making  it  possible 
to  achieve  real-time  rates  (30  msecs)  unless  the  modification  function  is  complex  and  data  dependent.  In  dial 
ease  there  is  an  intermediate  step  of  calculating  die  function  which  would  be  handled  by  a  programmable 
processor.  For  the  display-type  system,  die  user  must  progiain  both  phases  separately  and  probably  write 
temporary  files  between  them;  although  easy  to  do.  this  approach  is  time  consuming. 

CMU’s  mi'hvi  vision  system  would  use  one  IPU  to  calculate  the  histogram  and  die  modifying  functor 
while  another  uses  the  results  to  perform  the  pixel  modification.  The  two  IPUs  would  dicn  operate  s  a 
pipeline.  If  the  modification  to  be  performed  is  simple  equalization,  processing  times  as  low  as  50  msecs  per 
image  may  bo  obtained. 

4.4.  Connectivity 

In  this  problem  we  try  to  decide  whether  a  pixel  belongs  to  a  cluster  of  pixels  or  not.  A  criterion,  typically 
similarity  in  intensity  value,  is  used  to  determine  if  a  pixel  is  part  of  any  of  the  known  clusters.  I'his  is  a  data 
dependent  operation  and  is  therefore  difficult  to  implement  in  hardware  unless  die  image  is  binary.  A 
display-type  system  is  programmed  to  perform  die  algorithm  and  its  execution  speed  depends  only  on  the  raw 
speed  and  load  of  the  host  computer. 

The  implementation  in  die  roi’lYP  vision  system  is  straightforward  due  to  the  logical  transform  opera¬ 
tions  available  in  die  FTP  subsystem.  The  PI  P  will  execute  an  optimized  connectivity  algoridun  in  less  dian 
100  msecs. 
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4.5.  Conclusions 

It  was  shown  that  POIM-YP  compares  very  well  with  other  architectures  when  dealing  with  image  process¬ 
ing  tasks.  I •  ven  though  it  is  in  general  slower  titan  the  analysis-type  systems,  its  programmability  makes  it  an 
ideal  candidate  to  evaluate  different  vision  algorithms.  A  few  examples  will  be  presented  in  section  5. 

it  should  be  mentioned  that  even  though  the  display-type  system  exhibited  lower  performance  than  the 
other  two  systems,  it  is  often  supplied  with  a  library  of  functions  directly  callable  from  an  application 
program.  'ITtis  type  of  system  is  also  not  limited  by  memory  which  makes  it  very  well  suited  to  off-line  image 
processing  like  satellite  cartography  or  multi-color  imaging  such  as  that  used  in  medical  applications. 

On  a  system  like  POPP.YH,  the  user  must  develop  all  the  software  (at  least  once)  which  often  takes  a 
considerable  amount  of  time  and  effort.  One  of  the  advantages  is  that  it  is  possible  to  clone  similar  systems  — 
possibly  scaled  down  versions  —  to  be  used  in  the  field. 

5.  Examples 


5.1 .  Cellular  Logic  Operations 

A  large  number  of  image  processing  problems  may  be  solved  with  simple  binary  images.  The  main 
problem  with  binary  vision  systems  is  that  light  variations  affect  the  choice  of  threshold.  Hie  vision  system, 
being  a  gray  level  system,  deals  with  these  problems  in  a  very  simple  way:  ii  presents  the  gray  level  image  to 
the  user,  allowing  him  to  choose  the  threshold  based  on  any  criterion  he  wants.  Furthermore,  the  image  is 
typically  kept  with  the  full  8  bits  of  resolution  so  another  threshold  may  be  chosen  at  a  later  time. 

One  of  the  reasons  why  someone  may  want  to  solve  a  problem  via  binary  vision  is  that  all  die  posiblc 
operations  with  binary  pixels  arc  boolean  in  nature  and  thus  capable  of  being  performed  in  hardware.  Preston 
ct  al.  w- ,h  have  defined  a  number  of  elementary  neighborhood  operations  for  binary  images.  They  arc  based 
on  two  local  measures  on  a  neighborhood:  the  factor  number  (f-num)  and  the  crossing  number  (c-num). 

In  the  local  neighborhood  of  a  pixel,  the  f-num  will  be  the  number  of  Is  found  while  the  c-num  will  be 
the  number  of  1-0  or  0-1  transitions  found  while  traversing  the  neighborhood  in  the  clockwise  direction. 
Based  on  the  f-num  and  the  c-num  of  a  pixel  (say  UjJ),  two  boolean  variables  fj  and  Cjj  arc  defined  as 


fl  iff  (f—num  of  u,,)  >  <p 
Jij  ~  I . 

otherwise 

*-c 


10  otherwise 

_  fl  iff  (c-num  of  UjJ) £ 
tO  otherwise 

where  Os  <p  s  8  and  0  S  ^  S  9  arc  the  two  thresholds  that  determine  the  properties  of  the  particular  cellular 


logic  operator  (CI.O).  The  two  most  common  Cl  .Os  arc  the  reduce  (RF.I))  operator  and  the  augment  (AUG) 
operator. 

Ihc  RKI)  operator  is  defined  by  the  boolean  equation 

Note  that  due  to  the.  AND  operator,  only  pixels  which  were  originally  1  may  change  (to  0).  If  we  use  the 
convention  that  a  region  consists  of  Is  embedded  in  a  background  of  Os,  the  number  of  pixels  in  a  region  may 
only  be  reduced  (hence  die  name  of  the  operator).  The  inverse  operator  (AUG)  may  only  change  pixels  that 
were  0  (to  1 )  and  is  defined  as 

viJ=uijV  -'(f'ijVCjj) 

where  f',j  is  f-num  redefined  so  it  counts  the  number  of  Os  in  die  neighborhood,  lhat  is, 

_  JT  iff  ( MAX  —  ( f-num  of  u^)) > q> 

J  ij  —  1 

tO  otherwise. 

Here  MAX  is  the  number  of  pixels  in  the  neighborhood.  Preston’6  has  shown  the  behaviour  of  the  RHD 
CI.O  with,  different  threshold  combinations. 

The  PTP  has  beer,  designed  to  execute  both  CLOs  very  rapidly  (around  >00  msecs,  per  CI.O  over  a  256  x 
256  pixel  image).  Furthermore,  we  are  currently  studying  the  extension  of  the  cellular  logic  ideas  to  gray  level 
images  and  the  PI  P  will  be  just  as  fast  with  gray  level  data.  In  the  next  section  we'll  present  an  example  diat 
utilizes  die  cellular  logic  operations. 

5.2.  Gradient  Segmentation 

POlM-YI-  has  been  used  to  implement  the  Piecewise  Gradient  Segmentation  Algorithm19  illustrated  in 
figure  5-1.  The  algorithm  consists  of  six  major  steps. 

1.  ONI?  DIMI'.NSIONAI  I-T.AIURH  EXTRACTION.  The  algoridiin  starts  by  extracting  one  dimensional 
features  from  the  original  image.  On  each  of  die  two  major  directions,  along  rows  and  columns, 
the  image  is  analy/.cd.  The  image  is  modeled  using  fixed-length  blocks  of  pixels  to  make  the 
procedure  less  computationally  expensive.  For  each  block  we  calculate  the  mean  intensity,  the 
standard  deviation  from  the  mean  and  the  slope  of  the  best  linear  regression  fit  to  the  pixels  of  the 
block.  This  slope  is  related  to  the  intensity  gradient  component  in  the  modeled  direction,  litis  first 
step  is  implemented  on  the  lPUs  with  each  one  modelling  in  one  of  the  two  major  directions. 

2.  GRADIENT  TO  INTENSITY  mapping.  The  output  of  the  previous  step  is  an  array  of  models  for  each 
of  the  two  analyzed  directions:  horizontal  and  vertical.  From  the  slopes  of  the  linear  regression  fit 
we  generate  a  slope  map,  an  intensity  display  of  the  model  slopes  where  the  largest  positive  slopes 
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I'igure  5*1:  Block  Diagram  of  the  Piece-wise  Gradient  Segmentation  Algorithm 

arc  assigned  the  maximum  brightness  value  of  255  and  the  largest  negative  slopes  arc  assigned  the 
minimum  value  of  0.  Hius  the  pixels  with  a  value  of  128  belong  to  regions  of  constant  intensity 
(no  intensity  slope).  This  step  and  the  previous  step  arc  implemented  simultancosly  on  the  IPUs. 

3.  THRESHOLDING  OFTHESLOPE  map.  Kach  slope  map  is  next  thrcsholdcd  to  obtain  up  to  five  binary 
images  corresponding  to  regions  of  zero  slope,  small  positive  and  negative  slopes  and  large  posi¬ 
tive  and  negative  slopes.  Although  preliminary  results  have  shown  that  die  threshold  is  not 
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strongly  dependent  on  the  lighting  conditions,  it  is  nevertheless  a  data-dependent  operation.  1110 
thresholding  is  done  by  the  PTR  at  die  same  time  it  performs  die  first  cycle  of  the  next  step: 
cellular  logic  operations. 

4.  cn  i.Ll  .AK  LOGIC  operations  ON  Till-  BINARY  images.  Ihis  step  uses  the  cellular  logic  operations 
described  in  die  previous  example.  An  AUG  cycle  with  factor  number  of  two  followed  by  a  RED 
cycle  with  die  same  parameters  arc  done  first  to  filter  spurious  blocks  set  to  1  by  noise  or  in¬ 
accuracies  in  the  modeling.  Then  eight  AUG  cycles  with  factor  number  of  four  followed  by  eight 
RED  cycles  are  used  to  smooth  die  ragged  regions  obtained  from  the  simple  dircsholding.  Ihis 
stage  in  the  algorithm  is  performed  in  die  PI  P  as  discussed  previously. 

5.  C'ONNLCTIVITY  analysis.  Once  die  regions  have  been  cleaned  up.  we  proceed  to  extract  their  two 
dimensional  geometrical  features  (area,  perimeter,  center  of  gravity,  first-  and  second-moment 
invariants  and  first  cross  moment)  along  with  a  description  of  their  spatial  relations  with  one 
another.  A  fast  one-pass  aigoridim  has  been  designed  to  be  used  in  the  PI  P  as  discussed  in  the 
performance  section.  Ihc  IPUs  retrieve  die  results  from  die  PI  P  and  add  to  them  die  typical 
model  parameters  (mean  intensity  and  standard  deviation)  so  the  MPU  can  retrieve  all  the  infor¬ 
mation  from  the  IPUs'  image  page. 

6.  GENERATION  01  A  RELATIONAL  DESCRiFI  ION.  Finally,  a  structural  description  of  all  die  slope 
regions  (up  to  five  in  each  direction)  is  formed  in  memory  by  the  MPU.  I  his  representation  may 
be  used  to  classify  an  object,  determine  its  orientation  or  even  perform  scene  interpretation  as 
explained  in  reference19. 

Figure  5-2  shows  the  photograph  of  a  paper  cup  lighted  from  one  side.  It  is  easy  to  see  how  the  shading 
makes  it  impossible  for  simple  thresholding  to  provide  an  adequate  representation  of  die  object.  The  figure 
also  shows  two  of  the  five  possible  regions  obtained  from  the  horizontal  models,  dicy  correspond  to  the  small 
positive  and  negative  slopes. 

5.3.  Automatic  Focusing 

Seveial  automatic  focusing  algorithms  have  been  used  by  various  researchers  in  the  past,  all  of  which 
depend  on  a  quality  of  focus  criterion  whose  value  is  monotonically  related  to  the  high  frequency  content  of 
die  image.  It  is  usually  assumed  that  die  point  of  best  focus  lies  at  die  point  of  largest  high  frequency  content. 
Horn39  at  MIT  used  a  one  dimensional  FFT  whose  input  points  were  circularly  arranged  in  the  image. 
Tenenbaum40  at  Stanford  used  a  dircsholdcd  version  of  the  Sobcl  gradient  operator.  Both  were  successful. 

Several  focusing  methods  arc  described  below. 

•  Histogram  Ent  ropy  Minimization.  The  histogram  is  tallied  over  a  window  of  the  image  and 
its  entropy  computed.  The  sharper  the  focus  of  the  image,  the  more  definite  the  peaks  in  the 


Figure  S-2:  Small  positive  and  negative  slope  regions  of  a  paper  cup  (photo). 

histogram  become.  The  entropy,  a  measure  of  the  "randomness”  of  a  probability  density  function, 
is  related  to  the  shape  of  the  peaks.  In  image  processing,  we  use  the  histogram  as  an  estimate  of 
the  probability  density. 

High  Frequency  Content  Maximization.  All  the  focusing  algorithms  described  here  some¬ 
how  depend  on  high  frequency  content,  but  none  so  obviously  as  the  Fourier  Transform.  The 
usual  scheme  is  to  compute  a  one  or  two  dimensional  FFT,  estimate  the  power  spectrum  density 


from  the  squared  magnitude  of  the  KIT,  sum  the  high  frequency  terms,  and  then  maximize  die 
sum  by  refocusing. 

•  TllRKStioi  iM  I)  Gradient  Magnitude.  The  steepness  of  dark  to  light  and  light  to  dark  tran¬ 
sitions  in  an  image  is  dependent  on  die  quality  of  focus.  In  two  dimensions,  die  steepness  is  found 
by  computing  the  gradient.  liy  summing  the  gradient  estimates  over  a  window  of  the  image, 
another  estimate  of  the  quality  of  focus  is  obtained.  Unfortunately,  since  the  gradient  sum  is 
constant  by  definition,  the  gradient  estimates  obtained  at  each  point  must  be  thrcsholdcd,  thereby 
making  the  operation  nonlinear.  The  nonlinearity  makes  the  algorithm  difficult  to  analyze. 

•  Adaptive  Segmentation.  One  of  the  newer  schemes  for  describing  an  image  has  been 
developed  recently  here  at  CMU,  and  is  referred  to  as  adaptive  segmentation.  This  is  a  generaliza¬ 
tion  of  the  gradient  segmentation  algorithm  described  previously. 

Typically,  an  image  will  contain  large  homogeneous  sections.  The  general  idea  of  segmentation  is 
to  cluster  all  the  pixels  in  these  sections  into  one  bin.  thereby  reducing  the  amount  of  data  which 
needs  processing.  ‘Hie  hard  part  is  defining  what  we  mean  by  homogeneous.  Several  successful 
ideas  have  been  tried  so  far,  and  some  seem  to  be  applicable  to  focusing.  In  particular,  descrip¬ 
tions  that  yield  information  concerning  the  variance  of  the  pixel  values  in  certain  areas  can  be 
used  to  extrmizc  the  variance,  thereby  focusing  the  input  image. 

•  Cei.i  li  ar  1.0GIC.  One  of  the  most  attractive  features  of  cellular  logic  is  its  deftness  at  edge 
detection.  Kdges  arc  the  single  most  important  features  of  images  which  strive  to  be  in  focus,  and 
successful  attempts  at  automatic  focusing  using  cellular  logic  have  already  been  made  in  die  image 
processing  laboratory  of  a  nearby  hospital.  1'he  insights  gained  from  study  there  are  being  applied 
to  die  focusing  problem  at  CMU. 

The  histogram  entropy  and  thrcsholdcd  gradient  magnitude  algorithms  have  been  implemented.  Due  to 
aliasing  on  the  spatial  frequency  domain,  the  histogram  entropy  algorithm  is  useful  only  in  die  region  near  the 
point  of  best  focus,  but  runs  very  quickly.  The  gradient  algorithm  is  slower  by  a  factor  of  approximately  5.  but 
focuses  as  well  as  humans  can. 

5.4.  Adaptive  Spatial  Filtering 

Often  an  image  has  enough  noise  in  it  to  foil  whatever  algorithm  is  attempting  to  make  sense  of  it.  1‘he 
natural  thing  to  try  is  removing  the  noise.  By  far  the  most  common  technique  used  by  image  processing 
wizards  to  reduce  the  amount  of  noise  present  in  an  image  is  spatial  averaging.  The  two  algorithms  most  often 
used  arc  the  simple  four  and  eight  pixel  replacements  of  Kquations  1  and  2.  where  the  pixels  arc  labelled  as  in 

Figure  5-3. 
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Figure  5*3:  Pixel  Map  for  the  Standard  Spatial  Averaging  Algorithms 
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Hie  action  of  the  spatial  filtering  algorithms  is  easily  interpreted  in  the  context  of  Laplace's  equation. 
Consider  Jhc  intensity  of  an  image  as  a  function  of  the  two  spatial  variables  as  a  surface  in  three  dimensional 
space.  To  reduce  noise,  what’s  needed  is  to  minimize  the  curvature  of  the  surface  at  every  point.  The  best  we 
car  hope  for  is  zero  curvature,  so  we  set  some  estimate  of  the  curvature  to  zero.  This  is  exactly  what  Laplace’s 
equation  docs  (Hquation  3). 
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Hquation  4  is  one  of  the  most  grotesque  yet  still  acceptable  approximations  to  the  second  derivative  available. 
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Combination  of  Kquations  3  and  4  ycilds  Equation  1,  the  four  pixel  averaging  scheme.  The  eight  pixel  scheme 
comes  from  taking  into  account  the  derivatives  in  the  diagonal  directions  as  well. 

The  principal  drawback  inherent  in  spatial  averaging  is  the  tendency  to  blur  the  image.  Since  the 
processed  value  of  each  pixel  depends  on  the  values  of  its  neighbors  as  well  as  on  its  own.  the  energy  in  the 
image  spreads  out  after  each  filtering  pass.  Both  algorithms  arc  actually  low-pass  filters,  and  may  be  analyzed 


as  such.  In  the  four  pixel  ease,  the  /-transform  of  Kquaiion  1  yields  Kquntion  5.  where  and  z2  arc  the  /. 
transfonn  variables  of  m  and  n. 


By  incorporating  some  "intelligence"  into  die  filtering  algorithm,  it's  possible  to  remove  noise  in  certain 
areas  of  the  image  while  leaving  others  untouched.  For  example,  homogeneous  areas  of  the  image  could  be 
filtered  without  sacrificing  edge  character,  an  operation  clearly  needed  when  performing  edge  or  line  detec¬ 
tion.  This  type  of  smart  filter,  called  an  adaptive  spatial  averaging,  or  ASA  filter,  is  actually  two  filters:  one 
which  decides  which  areas  of  die  image  arc  to  be  filtered,  and  another  which  performs  die  filtering. 

A  small  interpretive  language  to  implement  the  idea  of  two  pass  filtering  was  written  with  the  aid  of  the 
compiler  writing  tools  on  UNIX.  Figure.  5-4  gives  a  syntax  summary  of  the  language.  A  small  set  of  utility 
commands  is  included  to  avoid  returning  to  the  support  program  every  time  the  user  wants  to  do  something 
simple  like  clearing  or  updating  die  screen.  A  simple  conditional  statement  and  a  library  of  filtering  functions 
enable  the  processing  engine  to  use  one  filter  to  select  certain  pixels  for  processing  by  a  second  filter,  or  to 
mark  the  selected  pixels  so  die  user  can  see  what's  going  on.  Currently  implemented  filters  include  the  Sobcl 
edge  operator  and  several  low  and  high  pass  convolution  kernels.  I.p8,  for  example,  is  an  eight  point 
neighborhood  average. 

command:  <s1mplecmd>  or  <filter>  or  <statement> 

simplecmd:  read,  show,  clear,  pause,  sleep  <n>,  quit  or  rD 

filter:  lp4,  lpfl,  hp4,  hpfl,  pixel  or  sobel 

statement:  clip  <opxn>  or 

If  <cond>  then  <action>  or 
<var1able>  ■  <n> 

cond:  <f11ter>  <op>  <n>  or  (cond) 

op:  <,  <■,  >,  >■,  ■  or  !■ 

action:  <f11ter>  or  mark  <n> 

n:  an  Integer 

Figure  5-4:  Syntax  of  the  Adaptive  Spatial  Filtering  Language 
The  usefulness  of  the  language  is  certainly  not  limited  to  ASA  operations,  since  die  library  of  filters  can 


be  easily  expanded.  It  is  our  intention  to  extend  the  capabilities  of  the  language  in  the  near  future.  I  he 
following  arc  two  examples  of  input  to  the  interpreter. 

I 

|  Produce  a  binarized  edge  map  of  the  Image. 

I 

sobel  |  Run  the  Sobel  edge  operator, 

read  j  Read  the  new  image  into  memory, 

if  (pixel  <  200)  then  mark  0  |  Mark  low  edge-strength  pixels  black, 

if  (pixel  >s  200)  then  mark  255  j  Mark  high  edge-strength  pixels  white. 


Alternatively,  a  program  producing  the  same  results  with  less  computation  since  it  only  makes  two  passes 
over  the  window  is  given  below. 

I 

|  Produce  a  binarized  edge  map  of  the  image  (fast  version). 

I 

if  (sobel  <  200)  then  mark  0  j  Mark  low  edge-strength  pixels  black, 
read  j  Read  the  new  image  into  memory. 

if  (pixel  >  0)  then  mark  255  j  Mark  high  edge-strength  pixels  white. 


ITic  second  example  marks  pixels  with  a  high  edge  strength,  pauses,  updates  the  screen  and  then  filters 
ail  the  pixels  with  a  low  edge  strength  using  a  low  pass  filter.  Hie  result  is  that  only  the  homogeneous  or 


slightly  shaded  areas  of  the  image  undergo  spatial  averaging. 

I 

|  Adaptive  Spatial  Averaging  Example 

I 

if  (sobel  >  200)  then  mark  255  | 

pause  j 

show  f 

if  (sobel  <*  200)  then  lp8  j 


Show  which  pixels  will  be  filtered. 
Let  the  user  look  for  a  bit. 

Put  the  old  image  back  up. 

Perform  the  ASA  passes. 


6.  Conclusions 

The  POi'LYi  vision  sysicrn  described  in  this  paper  has  been  developed  at  CMU  as  an  experimental  tool 
for  the  study  of  visual  inspection,  object  orientation,  object  classification,  and  interactive  control  tasks,  flic 
design  goals  of  the  system  were  to  provide  flexibility  in  the  development  of  algorithms  and  systems  concepts 
with  reasonable  speed  of  performance  and  moderate  cost.  Ihc  resulting  hardwarc/sofiwarc  system  now  serves 
as  a  semi-portable  stand-alone  system  which  may  conveniently  be  utilized  in  different  laboratories  for  studies 
of  specific  applications.  The  POPI-YI-:  system  provides  an  integrated  gray-level  vision  system  capability  for  the 
Flexible  Assembly  Laboratory  and  is  used  in  conjunction  with  robotic  manipulators,  a  binary  vision  system, 
tactile  and  force  sensors  for  sensor-based  control  and  assembly  experiments. 


The  capabilities  of  the  pope  YE  system  arc  evolving  through  the  addition  of  custom  boards.  The  multiple 
bus  architecture  offers  useful  alternatives  for  the  design  of  boards  with  varying  complexity  and  cosl  As 


specific  .strategies  for  recognition  and  interpretation  of  images  for  industrial  applications  evolve,  we  anticipate 
refined  implementation  of  hardware  and  software  mechanisms  for  these  purposes.  Recent  applications  of  the 
system  to  industrial  problems  have  included  the  characterization  of  a  coating  process  using  variance  measures 
of  local  texture,  inspection  of  glass  integrity  using  edge- following  techniques,  the  determination  of  object 
orientation  for  robot  acquisition  using  piecewise  gradient  modeling  and  histogram  modification  methods,  and 
the  validation  of  assembly  procedures  using  image  subtraction  to  isolate  component  pans  under  manipulator 
control. 
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